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Fast Fourier transform (FFT) processor is a prevailing tool in converting 
signal in time domain to frequency domain. This paper provides signal-to- 
noise ratio (SNR) study on 16-point pipelined FFT processor implemented 
on field-programable gate array (FPGA). This processor can be used in vast 
digital signal applications such as wireless sensor network, digital video 
broadcasting and many more. These applications require accuracy in their 
data communication part, that is why SNR is an important analysis. SNR is a 
measure of signal strength relative to noise. The measurement is usually in 
decibles (dB). Previously, SNR studies have been carried out in software 
simulation, for example in Matlab. However, in this paper, pipelined FFT 
and SNR modules are developed in hardware form. SNR module is designed 
in Modelsim using Verilog code before implemented on FPGA board. The 
SNR module is connected directly to the output of the pipelined FFT module. 
Three different pipelined FFT with different architectures were studied. The 
result shows that SNR for radix-8 and R4SDC FFT architecture design are 
above 40dB, which represent a very excellent signal. SNR module on the 
FPGA and the SNR results of different pipelined FFT architecture can be 
consider as the novelty of this paper. 
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1. INTRODUCTION 

Discrete Eourier transform (DET) is a well-known algorithm used in digital signal processing 
applications. While EET is the resulting algorithm from exploitation of DET symmetry and periodicity 
characteristic. EET reduced the complexity and computational requirement of the DET from N^ to N log 2 N, 
based on Cooley-Turkey algorithm [1]. Shown in Equation 1 is the EET implementation to compute the 
complex DFT, where N is number of FFT point and k = 0,1,2, ... ,N-1. 

j2n 

X(k) = Y,^Z^x(n)e~ ( 1 ) 

Coley-Turkey proposed algorithm decomposed the N-point DFT into recurrent 2-point DFT 
operations, called radix-2 algorithm. This algorithm implements the butterfly block, which consist of an 
addition and a subtraction operation of complex numbers. Higher radix such as radix-4 and radix-8 applied 
the same basic algorithm as radix-2 but with reduced multiplication complexity. However, the butterfly 
structure has higher complexity and its required multiple input complex adders [2]. Previously, this algorithm 
was on software based. However, with development of semiconductor technology, FFT algorithms is now 
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possible for hardware implementation. This paper proposed pipelined EET processor and SNR modules 
implemented on EPGA board [3], [4], with simulation designs on Modelsim. 

Pipelined PET processor is a popular choice in digital signal processing. Pipelined architecture has 
the advantage of parallelism and pipeling making it very fast. Euthermore, small number of basic cells can be 
used repeatedly, reducing the design complexity. [5] Examples of digital design applications using pipelined 
EET are ortoghonal frequency division multiplexing (OPDM), code division multiple access (CDMA) and in 
some of wireless receiver processing block [6]. Other available PET architectures are memory-based, cache 
memory, and array architecture. To ensure accuracy and good communication, SNR [7] of this PET module 
need to be study. SNR is the ration of the signal relative to the noise in decibels. SNR measure the signals 
based on this Equation; 

SNR^b = lOlog^o (2) 

^noise ' 

Ideally, Psignal must be greater than Pnoise to obtain positive SNR for the signal to be clearly readable. 
If Psignal is equal to Pnoise^ then SNR is equal to zero, the signal is almost unreadable and in digital 
communication, it will cause reduction in data speed and will force the transmitter to resend back the data. 
The worse case is when PsignaM loss thenP^^^e where reliable communication is not possible. 

This paper studied SNR on pipelined PET processor. Three types of pipelined PET discsuss in this 
paper are radix-8 [1], [8], [9], radix-4 single-path delay feedback (R4SDP) [7], [9], [10] and radix-4 single¬ 
path delay commutator (R4SDC) [9], [11]. 


2. METHODOLOGY 

In this section, the research methodology can be deivided into two stages. Pirst is the pipelined PET 
design and verification, and next is the SNR analysis. 

2.1. Pipelined EFT Design and Verification 

Pipelined EFT and its sub-modules are designed in both Matlab and Modelsim as shown in Figure 1. 
Random data generated as input Xi(n) to the EFT module in Matlab and converted from floating point to 
digital form, Xi(n). Matlab output, yi(n) are also converted to digital value which can be in hexadecimal, 
binary or any other digital values. Input Xi(n) are then supplied to the pipelined EFT module for hardware 
simulation for EPGA implementations. Output Y2(n) are then compared to Yi(n) for verification, as shown in 
Table 1, in results and discussion section. 



FPGA 



PlfWillnftc] FFT 
module 


[ Input ^ 

Output Yrtnjl 





(a) (b) 

Figure 1. (a) EFT Module in matlab (b) Pipelined EFT module in modelsim 


Figure 2 shows the example of test-bench structure for pipelined EFT modules with sample input 
and output signal for hardware simulation in Modelsim. Simulation is a very important process since in this 
process, the desired output of the pipelined EFT is verified. 
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Figure 2. Pipelined FFT testbench 


A Read-Only Memory (ROM) is connected to the FFT module inputs. This ROM store inputs data 
in .txt or .dat file. These data were obtained from Matlab simulation. As a benchmark, pipelined FFT 
modules were also design in Matlab. This Matlab coding can be modified to suit any pipelined FFT point or 
radix used. The generated input obtained from Matlab are then supply to the pipelined FFT module in 
Modelsim. Counter in this test-bench is for generating the address sequence to the ROM module. The ROM 
will supply the complex signal input data to the FFT modules within the period clock cycle set. Typically, if 
16-point pipelined FFT were used, the clock cycle will be 16, and so on depending on the FFT point. Then, 
the simulation process begins. From the simulation, the inputs and outputs can be displayed, usually in 
waveform. The results obtain are verified with Matlab. 

In this paper, three different FFT architecture were analyzed. First architecture is implemented from 
radix-8 algorithm. Although it has more complicated control, the design requires fewer twiddle factor 
multiplication, reducing the memory for stroring the twiddle factor. That is why FFT hardware solution favor 
higher radix in the implementation. Radix-8 FFT uses several special terms or known as twiddle factor 
and to reduce the algorithm complexity as shown Equation 3 and 4, where a is 

cos (27 i/N) and h is sin (27i/N). 

N ^ r- 

(a + jb)W^^ = -(a + jb)W^^ = Y [(« + ^) + j(b - a)] (3) 

3N 7N 

(a + jb)W^^ = -(a + jb)W^^ = Y [(« - f’) + j(b + a)] (4) 

These complex multiplications can be manipulated using two real-constant multiplications and two 
additions. Constant multiplication can also be replaced by shift-and-add operation. 

Another two pipelined FFT designs taken for comparison are from radix-4 algorithms. In radix-4 

nk+^ nk+— 

algorithm, it utilized four-way symmetry of ^ ~ minimize the number 

of complex multiplications. Its formula can be derived as Equation 5. For m = 0, 1,2, ..., A/4-1, 

X[4m] = [n + ^] + X [n + ^] + X [n + y]} ■ 

X[4m + 1] = {^^[n] - jx [n + ^] - x [n + j] + jx [n + y]} ■ 

X[4m + 2] = b[n] - X [n + ^] + X [n + j] - X [n + y]} ■ 

X{4m + 3] = b[n] + jx ^ + 7 ] “ ^ + 7 ] - ^ + ^]} ■ 

From this radix-4 algorithms, two architecture which are the R4SDFand R4SDC were considered. 
R4SDF provides high speed operation however it did not reduce the hardware utilization and power 
consumption. R4SDC give better speed and power as well as area reduction since stages are connected with 
commutator instead of feedback. [9]. 

2.2. Signal-to-noise Module 

Shows in Figure 3 is the methodology used to evaluate the SNR fitness [7], [12] for FFT modules. 
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Figure 3. SNR fitness methodology 


Initially random input data x(n) supplied to FFT processor W^i and this processor will calculate the 
outputs Xi(k). Wj^i act as reference solution, solve in Matlab environtment. Next, with the same x(n), Wm 
FFT processor recalculates the output X 2 (k). This FFT processor is in Modelsim (Verilog coding). Xi(k) and 
X 2 (k) are then supplied to SNR module in Modelsim for error calculation, based on equation (1). Pnoise or 
error calculation e(k) can be calculate using Equation 6. 

e{k) = [Ri(k) - /?2(fc)] +jUi(k) - /2(fc)] ( 6 ) 


Where Rj, R 2 ,1i, and I 2 are the real and imaginary output of FFT for Wj^i and Wm- Finally, SNR at 
the output of FFT is calculate using Equation (7). 


SNR = 


N-l 

e(/c)2 


(7) 


Where N is the size of the FFT. Equation (7) can be rewritten as equation (8). 


SNR = lOIo^io 11^=0 


[Rl(fc)]"+/l(fc)]" 

[i?i (/c) -R2 (k)] +j [/i (/c) -I2 (/C)] 


( 8 ) 


3. RESULTS AND ANALYSIS 

In this section, functionality analysis results on pipelined FFT and the SNR results when the 
wordlength varied is discussed. 

3.1. Pipelined FFT Fuctionality Analysis 

Figure 4 shows sample input and output of pipelined FFT from Modelsim simulation. Input start 
with ‘r START signal and ouput start with ‘1’ RDY signal. There is gap between the first input data and 
first output data because RDY signal can only be initiated when all input data passed all the sub-module in 
the overall circuit. This output delay is expected in a pipelined architecture. 

Input and output of pipelined FFT can be separated into two parts, 16 most-significant -bits (MSB) 
are real and 16 less-significant-bit (LSB) are the imaginary part. As shown in Figure 4, DR and DI represent 
real and imaginary input, respectively. While DOR and DOI represent real and imaginary output, 
respectively. The result shows are in hexadecimal. These hexadecimal values represent complex input in time 
domain and complex output for frequency domain. 



Figure 4. Sample input and output of pipelined FFT 
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Random inputs data are generated and passed through FFT module in Matlab, producing FFT 
outputs. These inputs and outputs data are taken as reference data. The same inputs are then supplied to the 
pipelined FFT modules in Modelsim. The output from this simulation are then compared to the outputs 
produced in Matlab. Table 1 show inputs and outputs from Matlab as reference and hardware simulation 
results obtain from three different 16-point pipeline FFT. All generated input and output shown are in 
hexadecimal. The original input data are in floating point, however, on hardware implementation, pipelined 
FFT cannot process this type of data. Using Matlab function, input and output data are converted to digital 
data, hexadecimal. 


Table 1. Pipelined FFT Output from Software and Hardware Simulation 



Software simulation (Matlab) 

Hardware simulation (EPGA) 

Input 

(Hex) 

Output 

(Hex) 

Radix-8 
Output 
(Hex) 

R4SDC 

Output 

(Hex) 

R4SDF 

Output 

(Hex) 

0 

F9FD07AE 

E605E3E5 

E620E3ED 

E61FE3EB 

E5EBE3EB 

1 

FE3E0A83 

EC53203B 

EC75203B 

EC752037 

EC512037 

2 

F7DAF9F3 

C4E61D3C 

C5161D3B 

C5171D3A 

C4F31D3A 

3 

F6ECF6AE 

3303E654 

332BE657 

3329E652 

3305E652 

4 

0974020A 

2C6B FE9B 

2C8E FE99 

2C8DFE99 

2C69FE99 

5 

0147E6E8 

15C41773 

05E91775 

15E91775 

15C51775 

6 

090A0320 

00C7D4A2 

00ECD4A1 

00EBD4A0 

00C7D4A0 

7 

FC2E0057 

04C71670 

04EB166D 

04EB1670 

04C71670 

8 

ED7CE524 

197A3 IDE 

19A031E1 

199E31DF 

197B31DF 

9 

E957E65A 

DDC7EEE5 

DDE9EEE7 

DDE9EEE7 

DDC5EEE7 

10 

E7E8E957 

CEOCEBD 

CE2EEBCF 

CE2DEBD0 

CE09EBD0 

11 

EDE1E478 

E1E10E85 

E2130E87 

E2110E86 

E1ED0E86 

12 

0A4E0B63 

007E1493 

00A21495 

00A11495 

007D1495 

13 

EEC3EEB9 

EC3416ED 

EC5916ED 

EC5916FD 

EC3516ED 

14 

EBD80A3E 

EA82DEEB 

FAA8DEE9 

EAA9DFEA 

EA85DEEA 

15 

ECA6E60B 

ED1D2A6E 

ED3F2A71 

ED3E2A70 

ED1B2A70 


From table above, the final simulation outputs using Modelsim shows significant and almost similar 
output to the output generated from Matlab for all different 16-point pipelined FFT used in this research. For 
example, first generated output from Matlab is E605E3F5, where this paper result is E620E3ED, E61EE3EB 
for R4SDC and R4SDE obtain E5EBE3EB. Next part on SNR will confirm this statement. 

3.2. Signal-to-noise Ratio 

Discuss in this sub-section are the SNR results for three different 16-point pipelined EET. The 
wordlength are varied from 16-bits to 8-bits for real and imaginary output of the pipelined EET. Number of 
bits used will affect all key parameters of design especially power, as well as speed and area. When number 
of bit reduces, switching activities also reduces, lowering the switched capacitance. This is desirable for 
power optimization design. Eurthermore, lower bits number will result in reduce number of transfer lines and 
average interconnect length and capacitance. 

Shown in Eigure 5 are average SNR value for radix-8, R4SDC and R4SDE with varied wordlength. 
16 sets randomdly generated input passed through three different pipelined EET modules. Output of these 
EET modules, along with references output are passed through the SNR modules. Erom graph obtain, the best 
SNR obtain for radix-8 and R4SDC is at 11-bit word length, while for R4SDE the highest SNR is at 16-bit 
wordlenght. Table 2 shows the average SNR value for three pipelined EET architecture studied. Radix-8 have 
the highest average SNR value compared to the rest. Higher SNR indicates better signal accuracy [1], [7], 
[ 12 ]. 
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Table 2. Average SNR Value 


FFT Type 

SNR (dB) 

Radix-8 

41.97 

R4SDC 

40.92 

R4SDF 

14.87 


Figure 5. SNR versus wordlength 


4. CONCLUSION 

In conclusion, all pipelined EET architectures analyses in this paper are functioning accordingly. 
Radix-8 pipelined EET processor have the highest SNR value which is 41.97dB compared to the other 
architecture studied in this paper. R4SDC also show a very good SNR result, above 40dB indicating an 
excellent signal. Based on these results, it can be summarized that pipelined EET in hardware implementation 
have a good SNR results and can be used in digital signal applications. This study can be futher extend in the 
futurekk for variation of EETs size and other architectures. 
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